1 

Distributed  Resource  Management  in  Multi-hop  Cognitive 
Radio  Networks  for  Delay  Sensitive  Transmission 

Hsien-Po  Shiang  and  Mihaela  van  der  Schaar 

Department  of  Electrical  Engineering  (EE),  University  of  California  Los  Angeles  (UCLA)  Los  Angeles,  CA 

{hpshiang,  mihaela}  @ee.ucla.edu 

Abstract 

In  this  paper,  we  investigate  the  problem  of  multi-user  resource  management  in  multi-hop  cognitive  radio  networks 
for  delay-sensitive  applications.  Since  the  tolerable  delay  does  not  allow  propagating  global  information  back  and 
forth  throughout  the  multi-hop  network  to  a  centralized  decision  maker,  the  source  nodes  and  relays  need  to  adapt 
their  actions  (transmission  frequency  channel  and  route  selections)  in  a  distributed  manner,  based  on  local  network 
information.  We  propose  a  distributed  resource  management  algorithm  that  allows  network  nodes  to  exchange 
information  and  that  explicitly  considers  the  delays  and  cost  of  exchanging  the  network  information  over  the 
multi-hop  cognitive  radio  networks.  The  term  “cognitive”  refers  in  our  paper  to  both  the  capability  of  the  network 
nodes  to  achieve  large  spectral  efficiencies  by  dynamically  exploiting  available  frequency  channels  as  well  as  their 
ability  to  learn  the  “environment”  (the  actions  of  interfering  nodes)  based  on  the  designed  information  exchange. 
Note  that  the  node  competition  is  due  to  the  mutual  interference  of  neighboring  nodes  using  the  same  frequency 
channel.  Based  on  this,  we  adopt  a  multi-agent  learning  approach,  adaptive  fictitious  play,  which  uses  the  available 
interference  information.  We  also  discuss  the  tradeoff  between  the  cost  of  the  required  information  exchange  and  the 
learning  efficiency.  The  results  show  that  our  distributed  resource  management  approach  improves  the  PSNR  of 
multiple  video  streams  by  more  than  3dB  as  opposed  to  the  state-of-the-art  dynamic  frequency  channel/route 
selection  approaches  without  learning  capability,  when  the  network  resources  are  limited. 

Index  Terms:  distributed  resource  management,  cognitive  radio  networks,  multi-hop  wireless  networks, 
multi-agent  learning,  delay  sensitive  applications. 

i.  Introduction 

The  demand  for  wireless  spectrum  has  increased  and  will  keep  increasing  rapidly  in  the  foreseeable  future 
with  the  introduction  of  multimedia  applications  such  as  YouTube,  peer  to  peer  multimedia  networks,  and 
distributed  gaming.  However,  scanning  through  the  radio  spectrum  reveals  its  inefficient  occupancy  in  most 
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Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 
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frequency  channels.  Hence,  the  Federal  Communications  Commission  (FCC)  suggested  in  2002  [1] 
improvements  for  spectrum  usage,  which  enable  more  efficient  allocations  of  frequency  channels  to 
license-exempt  users  without  impacting  the  primary  licensees.  Based  on  this,  cognitive  radio  networks  [2][3] 
were  proposed  which  enable  wireless  users  to  sense  and  learn  the  surrounding  environment  and  correspondingly 
adapt  their  transmission  strategies. 

In  such  cognitive  wireless  environments,  two  main  challenges  arise.  The  first  challenge  is  how  to  sense  the 
spectrum  and  model  the  behavior  of  the  primary  licensees  to  identify  available  frequency  channels  (spectrum 
holes)1.  The  second  challenge  is  how  to  manage  the  available  spectrum  resources  among  the  license-exempt 
users  to  satisfy  their  QoS  requirements  while  limiting  the  interference  to  the  primary  licensees.  In  this  paper,  we 
focus  on  the  second  problem,  i.e.  the  resource  management,  and  rely  on  the  existing  literature  for  the  first 
challenge  [4] [5]. 

The  majority  of  the  resource  management  research  in  cognitive  radio  networks  has  focused  on  a  single-hop 
wireless  infrastructure  [6]-[10].  In  this  paper,  we  focus  on  the  resource  management  problem  in  the  more 
general  setting  of  multi-hop  cognitive  radio  networks.  A  key  advantage  of  such  flexible  multi-hop  infrastructures 
is  that  the  same  infrastructure  can  be  re-used  and  reconfigured  to  relay  the  content  gathered  by  various 
transmitting  users  (e.g.  sources  nodes)  to  their  receiving  users  (e.g.  sinks  nodes).  These  users  may  have  different 
goals  (application  utilities  etc.)  and  may  be  located  at  various  locations.  For  the  multi-hop  infrastructure,  there 
are  three  key  differences  as  opposed  to  the  single-hop  case.  First,  the  users  have  as  available  network  resources 
not  only  the  vacant  frequency  channels  (spectrum  holes  or  spectrum  opportunities  [2] [6])  as  in  the  single-hop 
case,  but  also  the  routes  through  the  various  network  relays  to  the  destination  nodes.  Second,  the  transmission 
strategies  will  need  to  be  adapted  not  only  at  the  source  nodes,  but  also  at  the  network  relay  nodes.  In  cognitive 
radio  networks,  network  nodes  are  generally  capable  of  sensing  the  spectrum  and  modeling  the  behavior  of  the 
primary  users  and  thereby,  identifying  the  available  spectrum  holes.  In  multi-hop  cognitive  radio  networks,  the 
network  nodes  will  also  need  to  model  the  behavior  of  the  other  neighbor  nodes  (i.e.  other  secondary  users)  in 
order  to  successfully  optimize  the  routing  decisions.  In  other  words,  network  relays  also  require  a  learning 
capability  in  the  multi-hop  cognitive  radio  network.  Third,  to  learn  and  efficiently  adapt  their  decisions  over 

1  In  the  wireless  environment  without  primary  licensees,  such  as  the  ISM  band,  there  is  no  such  problem.  The  main  challenge  is  the 
resource  management  problem. 
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time,  the  wireless  nodes  need  to  possess  accurate  (timely)  information  about  the  channel  conditions,  interference 
patterns  and  other  nodes  transmission  strategies.  However,  in  a  distributed  setting  such  as  a  multi-hop  cognitive 
radio  network,  the  information  is  decentralized,  and  thus,  there  is  a  certain  delay  associated  with  gathering  the 
necessary  information  from  the  various  network  nodes.  Hence,  an  effective  solution  for  multi-hop  cognitive 
radio  networks  will  need  to  tradeoff  the  “value”  of  having  information  about  other  nodes  versus  the  transmission 
overheads  associated  with  gathering  this  information  in  a  timely  fashion  across  different  hops,  in  terms  the 
utility  impact. 

In  this  paper,  we  aim  at  learning  the  behaviors  of  interacting  cognitive  radio  nodes  that  use  simple 
interference  graph  (similar  to  the  spectrum  holes  used  in  [6] [8])  to  sequentially  adjust  and  optimize  their 
transmission  strategies.  We  apply  a  multi-agent  learning  algorithm  -  the  fictitious  play  (FP)  [15]  to  model  the 
behavior  of  neighbor  nodes  based  on  the  information  exchange  among  the  network  nodes.  We  focus  on 
delay-sensitive  applications  such  as  real-time  multimedia  streaming,  i.e.  the  receiving  users  need  to  get  the 
transmitted  information  within  a  certain  delay.  Due  to  the  informationally  decentralized  nature  of  the  multi-hop 
wireless  networks,  a  centralized  resource  management  solution  for  these  delay-constrained  applications  is  not 
practical  [14],  since  the  tolerable  delay  does  not  allow  propagating  information  back  and  forth  throughout  the 
network  to  a  centralized  decision  maker.  Moreover,  the  complexity  and  the  information  overhead  of  the 
centralized  optimization  grow  exponentially  with  the  size  of  the  network.  The  problem  is  further  complicated  by 
the  dynamic  competition  for  wireless  resources  (spectrum)  among  the  various  wireless  nodes  (i.e.  source 
nodes/relays).  The  centralized  optimization  will  require  a  large  amount  of  time  to  process  and  the  collected 
information  will  no  longer  be  accurate  by  the  time  transmission  decisions  need  to  be  made.  Hence,  a  distributed 
resource  management  solution,  which  explicitly  considers  the  availability  of  information,  the  transmission 
overheads  and  incurred  delays,  as  well  as  the  value  of  this  information  in  terms  of  the  utility  impact  is  necessary. 

The  paper  is  organized  as  follows.  In  Section  II,  we  discuss  the  main  challenges  of  the  dynamic  resource 
management  in  multi-hop  cognitive  radio  networks  and  the  related  works.  Section  III  provides  the  multi-hop 
cognitive  radio  network  settings  and  strategies  and  Section  IV  gives  problem  formulation  of  the  distributed 
resource  management  for  delay  sensitive  transmission  in  such  networks.  In  Section  V,  we  determine  how  to 
quantify  the  rewards  and  costs  associated  with  various  information  exchanges  in  the  multi-hop  cognitive  radio 
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networks.  In  Section  VI,  we  propose  our  distributed  resource  management  algorithms  with  the  information 
exchange  and  introduce  the  adopted  multi-agent  learning  approach  -  adaptive  fictitious  play  in  the  proposed 
algorithms.  Simulation  results  are  in  Section  VII.  Finally,  Section  VIII  concludes  the  paper. 

ii.  Main  Challenges  and  Related  Works 

A.  Main  challenges  in  multi-hop  cognitive  radio  networks 

To  design  such  a  distributed  resource  management  in  multi-hop  cognitive  radio  networks,  several  main 
challenges  need  to  be  addressed: 

•  Dynamic  adaptation  to  a  time-varying  network  environment 

Multi-hop  cognitive  radio  networks  are  generally  experiencing  the  following  dynamics:  1)  the  primary  users 
directly  affect  the  spectrum  opportunities  available  for  the  secondary  users,  2)  the  mobility  of  the  network  relays 
that  affects  the  network  topology,  3)  the  traffic  load  variation  due  to  multiple  applications  simultaneously 
sharing  the  same  network  infrastructure,  and  4)  the  time-varying  wireless  channel  conditions.  Given  the 
dynamic  nature  of  the  cognitive  radio  networks,  wireless  nodes  need  to  learn,  dynamically  self-organize  and 
strategically  adapt  their  transmission  strategies  to  the  available  resources  without  interfering  the  primary 
licensees.  Due  to  these  time-varying  dynamics,  the  outcomes  of  these  interactions  do  not  need  to  converge  to  an 
equilibrium,  i.e.,  disequilibrium  and  peipetual  adaptation  of  strategies  may  persist,  as  long  as  the  performance  of 
the  delay  sensitive  application  is  maximized  [15].  Hence,  repeated  information  exchange  among  network  nodes 
is  required  for  nodes  to  efficiently  learn  and  keep  adapting  to  the  changing  network  dynamics. 

•  Information  availability  in  multi-hop  infrastructures 

Due  to  the  informationally-decentralized  nature  of  the  multi-hop  infrastructure,  the  exchanged  network 
information  is  only  useful  when  it  can  be  conveyed  in  time.  The  timeliness  constraint  of  the  information 
exchange  depends  on  the  delay  deadline  of  the  applications,  the  information  overhead,  and  the  condition  of  the 
network  links,  etc.  Hence,  the  value  of  information  in  terms  of  its  impact  on  the  users’  utilities  will  need  to  be 
quantified  for  the  different  settings  of  the  multi-hop  cognitive  radio  network.  This  information  will  impact  the 
accuracy  with  which  the  wireless  nodes  can  model  the  behavior  of  other  nodes  (including  the  primary  users)  and 
hence,  the  efficiency  with  which  they  can  respond  to  this  environment  by  adequately  optimizing  their 
transmission  strategies. 
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B.  Related  works 

Distributed  dynamic  spectrum  allocation  is  an  important  issue  in  cognitive  radio  networks.  Various 
approaches  have  been  proposed  in  recent  years.  In  [8],  a  decentralized  cognitive  MAC  protocols  are  proposed 
based  on  the  theory  of  Partially  Observable  Markov  Decision  Process  (POMDP),  where  a  secondary  user  is  able 
to  model  the  primary  users  through  Markovian  state  transition  probabilities.  In  [9],  the  authors  investigated  a 
game-theoretic  spectrum  sharing  approach,  where  the  primary  users  are  willing  to  share  spectrum  and  provide  a 
determined  pricing  function  to  the  secondary  users.  In  [10],  a  no-regret  learning  approach  is  proposed  for 
dynamic  spectrum  access  in  cognitive  radio  networks.  However,  these  studies  focus  on  dynamic  spectrum 
management  for  the  single-hop  network  case. 

Exploiting  frequency  diversity  in  wireless  multi-hop  networks  has  attracted  enormous  interests  in  recent  years. 
In  [11],  the  authors  propose  a  distributed  allocation  scheme  of  sub-carriers  and  power  levels  in  an  orthogonal 
frequency-division  multiple-access-based  (OFDMA)  wireless  mesh  networks.  They  proposed  a  fair  scheduling 
scheme  that  hierarchically  decouples  the  sub-carrier  and  power  allocation  problem  based  on  the  limited  local 
information  that  is  available  at  each  node.  In  [12],  the  authors  focus  on  the  distributed  channel  and  routing 
assignment  in  heterogeneous  multi-radio,  multi-channel,  multi-hop  wireless  networks.  The  proposed  protocol 
coordinates  the  channel  and  route  selection  at  each  node,  based  on  the  information  exchanged  among  two-hop 
neighbor  nodes.  However,  these  studies  are  not  suitable  for  cognitive  radio  networks,  since  they  ignore  the 
dynamic  nature  of  spectrum  opportunities  and  users  (network  nodes)  need  to  estimate  the  behavior  of  the 
primary  users  for  coexistence.  To  the  best  of  our  knowledge,  the  dynamic  resource  management  problem  in 
multi-hop  cognitive  radio  networks  has  not  been  addressed  in  literature. 

In  summary,  the  paper  makes  the  following  contributions. 

a)  We  propose  a  dynamic  resource  management  scheme  in  multi-hop  cognitive  radio  network  settings  based  on 
periodic  information  exchange  among  network  nodes.  Our  approach  allows  each  network  nodes  (secondary 
users  and  relays)  to  exchange  their  spectrum  opportunity  information  and  select  the  optimal  channel  and  next 
relay  to  transmit  delay  sensitive  packets. 

b)  We  investigate  the  impact  of  the  information  exchange  collected  from  various  hops  on  the  performance  of  the 
distributed  resource  management  scheme.  We  introduce  the  notion  of  an  “information  cell”  to  explicitly  identify 
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the  network  nodes  that  can  convey  timely  information.  Importantly,  we  investigate  the  case  that  the  information 
cell  does  not  cover  all  the  interfering  neighbor  nodes  in  the  interference  graph. 

c)  The  proposed  dynamic  resource  management  algorithm  applies  FP  [15],  which  allows  various  nodes  to  learn 
their  spectrum  opportunity  from  the  information  exchange  and  adapt  their  transmission  strategies  autonomously, 
in  a  distributed  manner.  Moreover,  we  discuss  the  tradeoffs  between  the  cost  of  the  required  information 
exchange  and  the  learning  efficiency  of  the  multi-agent  learning  approach  in  terms  of  the  utility  impact. 

Next,  we  present  our  network  settings  of  the  multi-hop  cognitive  radio  networks. 

hi.  Multi-hop  Cognitive  Radio  Networks  -  Settings  and  Strategies 

A.  Network  entities 

In  this  paper,  we  assume  that  a  multi-hop  cognitive  radio  network  involves  the  following  network  entities  and 
their  interactions: 

•  Primary  Users  (PUs)  are  the  incumbent  devices  that  possess  transmission  licenses  for  specific  frequency 
bands  (channels).  Without  loss  of  generality,  we  assume  that  there  are  M  frequency  channels  in  the 
considered  cognitive  radio  network.  We  also  assume  that  the  maximum  number  of  primary  users  that  can  be 
present  in  the  network  equals  M  .  Note  that  these  primary  users  can  only  occupy  their  assigned  (licensed) 
frequency  channels  and  not  other  primary  users’  channels.  Since  the  primary  users  are  licensed  users,  they 
will  be  guaranteed  an  interference-free  environment  [2] [4].  When  a  primary  user  is  not  transmitting  data 
using  its  assigned  frequency  channel,  a  spectrum  hole  is  formed  at  the  corresponding  frequency  channel. 

•  Secondary  Users  (SUs)  are  the  autonomous  wireless  stations  that  perform  channel  sensing  and  access  the 
existing  spectrum  holes  in  order  to  transmit  their  data.  The  secondary  users  can  occupy  the  spectrum  holes 
available  in  the  various  frequency  channels.  In  this  paper,  the  secondary  users  are  deploying  delay  sensitive 
applications.  Specifically,  we  assume  that  there  are  V  delay  sensitive  applications  simultaneously  sharing 
the  cognitive  radio  network  infrastructure,  having  unique  source  and  destination  nodes.  These  secondary 
users  are  able  to  deploy  their  applications  across  various  frequency  channels  and  routes. 

•  Network  Relays  (NRs)  are  autonomous  wireless  nodes  that  perform  channel  sensing  and  access  the  existing 
spectrum  holes  in  order  to  relay  the  received  data  to  one  of  its  neighboring  nodes  or  SUs.  Hence,  unlike  in 
the  SUs  case,  there  is  no  source  or  destination  present  at  the  NRs.  Note  that  multiple  applications  can  use 
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the  same  NR  using  different  frequency  channels. 

B.  Source  traffic  characteristics 

Let  Vt  denote  the  delay  sensitive  application  of  the  i  -th  SU.  Assume  that  the  application  V,  consists  of 
packets  in  Kt  priority  classes.  The  total  number  of  applications  is  V .  We  assume  that  there  are  a  total  of 
K  =  Kt  +  1  priority  classes  (i.e.,  C  =  {Ci,...,CK  } ).  The  reason  for  adding  an  additional  priority  class  is 
because  the  highest  priority  class  Ck  is  reserved  for  the  traffic  of  the  primary  users.  The  rest  of  the  classes 
Ck,k  >  1  can  be  characterized  by: 

•  Xk ,  the  impact  factor  of  a  class  Ck .  For  example,  this  factor  can  be  obtained  based  on  the  money  paid  by  a 
user  (different  service  levels  can  be  assigned  for  different  SUs  by  the  cognitive  radio  network),  based  on  the 
distortion  impact  experienced  by  the  application  of  each  SU  or  based  on  the  tolerated  delay  assigned  by  the 
applications.  The  classes  of  the  delay  sensitive  applications  are  then  prioritized  based  on  this  impact  factor, 
such  that  Xk  >  Xk<  if  k  <  k k  =  2, ...,  K .  The  impact  factor  is  encapsulated  in  the  header  (e.g.  RTP  header) 
of  each  packet. 

•  Dk  ,  the  delay  deadline  of  the  packets  in  a  class  Ck .  In  this  paper,  a  packet  is  regarded  useful  for  the  delay 
sensitive  applications  only  when  it  is  received  before  its  delay  deadline. 

•  Lk ,  the  average  packet  length  in  the  class  Ck . 

A  variety  of  delay  sensitive  applications  can  use  the  cognitive  radio  set-up  discussed  in  this  paper.  Multimedia 
transmission  such  as  video  streaming  or  video  conferencing  can  be  examples  of  such  applications  [14].  We 
assume  in  this  paper  that  an  application  layer  scheduler  is  implemented  at  each  network  node  to  send  the  most 
important  packet  first  based  on  the  impact  factor  encapsulated  in  the  packet  header. 

C.  Multi-hop  cognitive  radio  network  specification 

We  consider  a  multi-hop  cognitive  radio  network,  which  is  characterized  by  a  general  topology  graph 
^(M,N,E)  that  has  a  set  of  primary  users  M  =  {m1;...,mM}  ,  a  set  of  network  nodes  N  =  {ni,...,nN} 
(include  SUs  and  NRs)  and  a  set  of  network  edges  (links)  E  =  {e1,...,eL}  (connecting  the  SUs  and  NRs).  There 
are  a  total  of  N  nodes  and  L  links  in  this  network.  Each  of  these  N  network  nodes  is  either  a  secondary 
user  (as  a  source  or  a  destination  node)  or  a  network  relay. 
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We  assume  that  F  =  {/lv..,/M}  is  the  set  of  frequency  channels  in  the  network,  where  M  is  the  total 
number  of  the  frequency  channels.  To  avoid  interference  to  the  primary  users,  the  network  nodes  can  only  use 
spectrum  holes  for  transmission.  Hence,  to  establish  a  link  with  its  neighbor  nodes,  each  network  node  n  €  N 
can  only  use  the  available  frequency  channels  in  a  set  F„  C  F  .  Note  that  these  wireless  nodes  in  a  cognitive 
radio  network  will  continuously  sense  the  environment  and  exchange  information  and  hence,  F„  may  change 
over  time  depending  on  whether  the  primary  users  are  transmitting  in  their  assigned  frequency  channels. 

The  network  resource  for  a  network  node  n  €  N  of  the  multi-hop  cognitive  radio  network  includes  the 
routes  composed  by  the  various  links  and  frequency  channels.  We  define  the  resource  matrix 
R„  =  [Rv]  e  {0, l}TxM  for  the  network  node  n  as  follows: 


1,  if  link  e,  is  connected  to  the  node  n 


JD 

nij  — 


and  the  frequency  channel  f  is  available. 
0,  otherwise. 


(1) 


Whether  or  not  the  resource  B,%]  is  available  to  node  n  G  N  depends  not  only  on  the  topology  connectivity, 

but  also  on  the  interference  from  other  traffic  using  the  same  frequency  channel.  Next,  we  discuss  the 
interference  from  other  users  (including  the  primary  users). 


D.  Interference  characterization 

Recall  that  the  highest  priority  class  C1  is  always  reserved  in  each  frequency  channel  for  the  traffic  of  the 
primary  users.  The  traffic  of  the  SUs  can  be  categorized  into  K  —  1  priority  classes  (C2,---,CK  )  for  accessing 
frequency  channels.  The  traffic  priority  determines  its  ability  of  accessing  the  frequency  channel.  Primary  users 
in  the  highest  priority  class  C\  can  always  access  their  corresponding  channels  at  any  time.  The  traffic  of  the 
SUs  can  only  access  the  spectrum  holes  for  transmission.  Hence,  we  define  two  types  of  interference  to  the 
secondary  users  in  the  considered  multi-hop  cognitive  radio  network: 

1)  Interference  from  primary  users. 

In  practical  cognitive  networks,  even  though  primary  users  have  the  highest  priority,  secondary  users  will 
cause  some  level  of  interference  to  the  primary  users  due  to  their  imperfect  awareness  (sensing)  of  the 
primary  users.  The  primary  users’  interference  depends  on  the  location  of  the  M  primary  users.  We  rely  on 
methods  such  as  in  [5]  that  consider  the  power  and  location  of  the  secondary  users  to  ensure  that  the 
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secondary  users  do  not  exceed  some  critical  interference  level  to  the  primary  users.  We  also  assume  that  the 
spectrum  opportunity  map  is  available  to  the  secondary  users  as  in  [6]  [10].  Since  the  primary  users  will  block 
all  the  neighbor  links  using  its  frequency  channel,  a  network  node  n  will  sense  the  channel  and  obtain  the 


Spectrum  Opportunity  Matrix  (SOM)  of  the  primary  users: 


z„  =  [Zf  e  {o,i}£xM,  with  zl} 


1,  if  the  primary  user  is  occupying  frequency  channel  f 
and  the  link  e,  can  interfere  with  the  primary  user. 

0,  otherwise. 


(2) 


A  simple  example  is  illustrated  in  Figure  1,  which  indicates  the  SOM  of  the  primary  users  and  the  resource 


matrix  of  each  network  node  in  the  multi-hop  cognitive  radio  network. 

F  =  U,/2} 

N  =  {nun2,n3} 

E  =  (ej ,  e2 ,  e3 } 


Spectrum  opportunity  Resource  matrix 

matrix  of  the  primary  at  each  node: 


users: 

A  A 

fi  A 

/i  A 

A  A 

ei 

0  1 

Cl 

1  1 

ei 

1  1 

ei 

0  0 

Ml 

to 

II 

1  1 

Ri  = 

1  1 

R-2  =  e2 

0  0 

R-3  — 

l  l 

e3 

0  1 

0  0 

e3 

1  1 

e3 

l  l 

Fig.  1.  A  simple  multi-hop  cognitive  radio  network  with  three  nodes  and  two  frequency  channels. 

2)  Interference  from  competing  secondary >  users. 

We  define  lk  =  [If  e  {(),  i}/-xW  as  the  Interference  Matrix  (IM)  for  the  traffic  in  priority  class  Ck,k  >  2  . 


fi 


1,  if  link  e,;  using  frequency  channel  f  can  be 
interfered  by  the  traffic  of  priority  class  Ck. 
0,  otherwise. 


(3) 


The  interference  caused  by  the  traffic  in  priority  class  Ck  can  be  determined  based  on  the  interference  graph 
of  the  nodes  that  transmit  the  traffic  (as  in  [10]).  The  interference  graph  is  defined  as  the  corresponding  links 
that  are  interfered  by  the  transmission  of  the  class  Ck  traffic2.  The  IM  can  be  computed  by  the  information 


2  In  a  wireless  environment,  the  transmission  of  neighbor  links  can  interfere  with  each  other  and  significantly  impact  their  effective 
transmission  time.  Hence,  the  action  of  a  node  can  impact  and  be  impact  by  the  action  of  the  other  relay  nodes.  In  order  to  coordinate 
these  neighboring  nodes,  we  construct  the  interference  matrix  with  binary  “1”  and  “0”. 
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exchange  among  the  neighbor  nodes. 

The  available  resource  matrix  can  be  masked  out  by  the  SOM  and  IM  of  the  higher  priority  classes,  i.e. 
RJ2  =  R,,  (g)  I;.  |  (g) ...  (g>  Z„,  where  the  notation  <g)  represents  element-wise  multiplication  of  the  matrixes  and 
I  denotes  the  inverse  operation,  which  turns  1  into  0  and  0  into  1.  The  resulting  resource  matrix  R^ 
represents  the  available  resource  around  the  network  node  n  for  the  class  Ck  traffic  under  the  interference  of 
other  higher  priority  traffic  (classes).  Next,  we  define  the  actions  available  to  the  network  nodes  in  a  multi-hop 
cognitive  radio  network. 


E.  Nodes  ’  actions 

We  define  the  action  of  the  network  node  n  in  order  to  relay  the  delay  sensitive  application  Vt  as 
A„  =  (e  e  E„,/  e  F„) .  We  assume  that  a  network  relay  n  can  select  a  set  of  links  to  its  neighbor  nodes  (links 
connected  to  node  n  )  E„  CE.  Corresponding  to  the  actions,  we  define  the  transmission  strategy  vector  of  the 
network  node  n  as  sn  =  [s_4  |  A  =  (e  e  En,f  e  F„)] ,  where  sA  represent  the  probability  that  the  network 
node  n  will  choose  an  action  A  .  We  refer  to  an  action  at  a  node  n  as  a  “ 'feasible  action  ”  for  transmitting  a 
class  Ck  traffic,  if  A  =  (e,/)  is  an  “available  resource”  in  R^  (i.e.  element  Ref  =  1  in  R^ ),  since  in  this 

case  the  selected  link  and  frequency  channel  do  not  interfere  with  the  traffic  in  the  higher  priority  classes.  That 
is, 

A „(*)  =  {A  =  (e, /)  |  R«  =  [RefrM,ReJ  =  1}  .  (4) 


We  denote  the  set  of  all  the  feasible  actions  for  node  n  as  A n(k)  for  class  Ck  traffic.  We  next  determine  the 
corresponding  delay  based  on  different  actions,  which  considers  the  deployed  cross-layer  transmission  strategies 
in  order  to  compute  the  Effective  Transmission  Time  (ETT)  [19]  over  the  transmission  links. 

Each  network  node  n  computes  the  ETT  ETTnk(e,f),  with  e  €  E „,/  e  F„  for  transmitting  delay  sensitive 
applications  in  priority  class  Ck: 


ETTnk(e,f) 


4 

Tn(e,f)x(l-  pn(e,f))' 


(5) 


Tn  (e,  /)  and  pn  (e,  /)  represent  the  transmission  rate  and  the  packet  error  rate  of  the  network  node  n  using  the 


frequency  channel  /  over  the  link  e.  Tn(e,f)  and  pn(e,f)  can  be  estimated  by  the  MAC/PHY  layer  link 
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adaptation  [20].  Specifically,  we  assume  that  the  channel  condition  of  each  link-frequency  channel  pair  can  be 
modeled  using  a  continuous-time  Markov  chain  [17]  with  a  finite  number  of  states  S("  ^  .  The  time  a  channel 

condition  spends  in  state  * e  is  exponentially  distributed  with  parameter  vt  (rate  of  transition  at  state  i 

in  transitions/sec).  We  assume  that  the  maximum  transition  rate3  of  the  network  is  v  and  the  variation  of  the 
channel  conditions  in  a  time  interval  r  <  1  /  v  is  regarded  negligible. 

Define  the  action  vector  A,  =  [,4„  |  n  G  rr,  as  the  vector  of  the  actions  of  all  the  network  relay  nodes  for 
transmitting  Vt .  Assume  that  the  i  th  delay  sensitive  application  Vt  are  transmitted  from  the  source  node 
nf  G  N  to  the  destination  node  nf  G  N  with  a  total  of  q,  packets.  The  routes  of  V,  are  denoted  as 
<t  =  {(Trj  |  j  =  1, . . . ,  q, }  ,  where  is  the  route  of  the  jth  packet  in  V, .  A  route  a,:i  is  a  set  of  link- frequency 

pairs  that  the  packets  flow  through,  i.e. 

a y  =  { (e,  /)  |  the  jth  packet  of  V%  flows  through  link  e  using  frequency  channel  /}  .  (6) 

Note  that  if  the  action  of  a  certain  relay  node  changes,  the  corresponding  route  a v  (A, )  of  relaying  Vt  also 
changes.  We  denote  the  end-to-end  delay  of  the  packets  transmitted  using  the  route  <t^( A,)  as  dri(a,l(Al)) . 

Based  on  the  topology,  each  network  relay  node  receiving  a  packet  can  decide  where  to  relay  the  packet  to  and 
using  which  frequency  channel,  in  order  to  minimize  its  end-to-end  delay  dij(aij(Ai)) .  Finally,  to  calculate 

A,;)),  the  source  node  need  to  obtain  the  delay  information  from  other  nodes  according  to  the  actions 
taken  by  the  relay  nodes,  i.e. 

d,,K(A,))  =  ETTnk(A,),  for  Vt  G  Ck  .  (7) 


iv.  Resource  Management  Problem  Formulation  over  Multi-hop  Cognitive 

Radio  Networks 

By  examining  the  cumulated  ETT  values,  the  objective  of  a  delay  sensitive  application  is  to  minimize  its  own 
end-to-end  packet  delay.  The  centralized  and  proposed  distributed  problem  formulations  are  subsequently 


3  In  case  that  some  of  the  channel  conditions  change  severely  in  the  network,  a  threshold  vtk  can  be  set  by  protocols  to  avoid  these 
fast-changing  nodes  and  the  v  is  hence  selected  as  the  maximum  transition  rate  below  this  threshold  value. 
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provided. 

•  Centralized  problem  formulation  with  global  information  available  at  the  sources 

If  we  assume  that  the  global  information4  Q,  is  available  to  the  source  node  n\  for  the  delay  sensitive 
application  I',  the  route  <jt](A,,CZt)  can  be  determined  for  each  packet  j  of  I'.  The  centralized  optimization 
can  be  performed  at  every  source  node  in  order  to  maximize  the  utility  .  Hence,  for  application  V,  we  have: 


A°pt  =  arg  max  (A,  ) 

subject  to  A  G  A„  for  all  A  G  A,: 


(8) 


where 


j= 1 


Prob{  dy  {&ij  (A  j  J  j  2  Dij }  ,  f)k 


and  Xt]  =  Ak 


if  j  G  C, 


k  • 


(9) 


However,  due  to  the  limited  wireless  network  resource,  the  end-to-end  delay  constraint  dt]  (ay  (A, ,C/% ))  <  Dk  can 

make  the  optimization  solution  infeasible.  Hence,  a  sub-optimal  greedy  algorithms  that  perform  optimizations 
sequentially  from  the  highest  priority  class  to  the  lowest  priority  class  are  commonly  adopted  [25] [  14]. 
Specifically,  for  class  Ck ,  the  following  optimization  is  considered: 


AT  =  arg  min  ^  d^a^ Aik,£)) 

subject  to  <1,, (a. .{A, <  Dk,  ,  (10) 

A  G  An  for  all  A  G  Aik. 

where  Alk  =  [An\n€  (Tivj  G  Ck\ . 

Due  to  the  informationally  decentralized  nature  of  the  multi-hop  wireless  networks,  the  centralized  solution  is 
not  practical  for  the  multi-user  delay  sensitive  applications,  as  the  tolerable  delay  does  not  allow  propagating  the 
global  information  £/,  back  and  forth  throughout  the  network  to  a  centralized  decision  maker.  For  instance,  the 
optimal  solution  depends  on  the  delay  d,:i  incurred  by  the  various  packets  across  the  hops,  which  cannot  be 

timely  relayed  to  a  source  node.  For  instance,  when  the  network  environment  is  time-varying,  the  gathered 
global  information  _</)  can  be  inaccurate  due  to  the  propagation  delay  for  this  information.  Moreover,  the 
complexity  of  the  centralized  optimization  grows  exponentially  with  the  number  of  classes  and  nodes  in  the 
network.  The  problem  is  further  complicated  by  the  dynamic  adaptation  of  the  transmission  strategies  deployed 


The  word  “global  information”  means  the  information  gathered  from  every  node  throughout  the  network.  We  discuss  the  required  information  in 
Section  V. 
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by  the  wireless  nodes,  which  impacts  their  spectrum  access  and  hence,  implicitly,  the  performance  of  their 
neighbor  nodes.  The  optimization  will  require  a  large  amount  of  time  to  process  and  the  collected  information 
might  no  longer  be  accurate  by  the  time  transmission  decisions  need  to  be  made. 

In  summary,  in  the  studied  dynamic  cognitive  radio  network,  the  decisions  on  how  to  adapt  the 
aforementioned  actions  at  sources  and  relays  need  to  be  performed  in  a  distributed  manner  due  to  these 
informational  constraints.  Hence,  a  “decomposition”  of  the  optimization  problem  into  distributed  strategic 
adaptation  based  on  the  available  local  information  is  necessary. 

•  Proposed  distributed  problem  formulation  with  local  information  at  each  node: 

Instead  of  gathering  the  entire  global  information  Q,  at  each  source,  we  propose  a  distributed  suboptimal 
solution  that  collects  the  local  information  £n  at  node  n  to  minimize  the  expected  delay  of  the  various 
applications  sharing  the  same  multi-hop  wireless  infrastructure.  Note  that  at  each  node  n  ,  the  end-to-end  delay 
for  sending  a  packet  j  e  Ck  inequation  (10)  can  be  decomposed  as: 

djj {(J )  dn  (p'lj')  T  E^dn  (k. (j %j )] ,  (11) 

where  d„  ( atJ )  represents  the  past  delay  that  packet  j  has  experienced  before  it  arrives  at  node  n  and 
E[dn(k,<7ij)]  represents  the  expected  delay  from  the  node  n  to  the  destination  of  the  packet  j  e  Ck .  The 
sending  packet  j  e  Ck  is  determined  by  the  application  layer  scheduler  according  to  the  impact  factor  Xk .  The 
information  about  Xk  can  be  encapsulated  in  the  packet  header  and  d„  (oy )  can  be  calculated  based  on  the 

timestamp  available  in  the  packet  header.  The  priority  scheduler  at  each  node  ensures  that  the  higher  priority 
classes  are  not  influenced  by  the  lower  priority  classes  (see  equation  (10)).  Since  at  the  node  n  the  value  of 
(cr,; )  is  fixed,  the  optimization  problem  at  the  node  n  becomes: 

Xpt  =  arg  min  E[dn  (k,  a  ^  (A,,  ,£n ))] 

subject  to  E[dn (k, o^- (fl„ ,/;))]  <  Dk  -  d^a^)  -  p,  j  €  Ck  ,  (12) 

*T  ^  A  ?? 

where  E[dv(k, /),))]  represents  the  expected  delay  from  the  relay  node  n  to  the  destination  of  the 
packets  in  class  Ck  .  p  represents  a  guard  interval  such  that  the  probability 
Prob {E[dn{k,a,J(An,£n))\  +  dn(atj)  >  Dk}  is  small  (as  in  [22]).  To  estimate  the  expected  delay 
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E[dn(k,oij(An,£n))\  in  equation  (12),  each  network  node  n  maintains  an  estimated  transmission  delay 
E[dn(k)]  from  itself  to  the  destination  for  each  class  of  traffic  using  the  Bellman-Ford  shortest-delay  routing 
algorithm  [17].  We  assume  that  each  node  n  maintains  and  updates  a  delay  vector  d„  =  [E[dn(2)\,...,E[dn(K)]\ 
(note  that  the  first  priority  class  is  reserved  for  the  primary  users)  with  elements  for  each  priority  class.  Each 
network  node  exchanges  such  information  to  its  neighbor  nodes  and  selects  the  best  action  A"/'1  for  the  highest 
priority  packet  in  the  buffer  of  the  network  node  n  .  We  will  discuss  the  minimum-delay  routing/channel 
selecting  algorithm  in  Section  VI.  Note  that  a  group  of  packets  in  the  buffer  of  a  node  n  can  take  the  action 
An ,  since  the  action  is  determined  based  on  local  information  £v .  Since  in  the  cognitive  radio  networks,  the 
available  channel  is  time-variant,  the  information  needs  to  be  timely  conveyed  to  the  network  node  for  the 
distributed  optimization.  Compared  to  the  centralized  approach  in  equation  (8),  the  distributed  resource 
management  in  equation  (12)  can  adapt  better  to  the  dynamic  wireless  environment  by  periodically  gathering 
local  information.  Next,  we  discuss  the  distributed  resource  management  with  information  constraints  in  more 
detail. 


v.  Distributed  Resource  Management  with  Information  Constraints 

A.  Considered  medium  access  control 

In  this  paper,  we  assume  that  the  required  local  information  £a  is  exchanged  using  a  designated 
coordination  control  channel  similar  to  [13].  Such  a  coordination  channel  can  be  selected  from  the  existing  ISM 
bands,  since  there  is  no  primary  licensee  in  these  bands  to  interfere  with.  The  transmission  is  time  slotted  and 
the  time  slot  structure  of  a  node  is  provided  in  Figure  2.  We  denote  the  time  slot  duration  as  /, .  The  action  A„ 
are  selected  at  each  node,  during  each  time  slot,  after  the  coordination  interval  (that  includes  the  channel  sensing 
for  SOM  and  the  information  exchange  for  IM).  We  denote  the  coordination  interval  at  the  network  node  n  as 
dj  {£„ ) .  The  goal  of  the  coordination  interval  at  each  time  slot  is  to  provide  the  feasible  action  set  An  for  the 
channel  access  and  the  relay  selection  of  the  packet  transmission.  We  will  discuss  how  to  obtain  An  based  on 
the  SOM  and  the  IM  among  the  neighboring  nodes  when  we  introduce  the  proposed  algorithm,  in  Section  VI. 
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Coordination  interval  Packets  transmission 


Fig.  2.  Transmission  time  line  at  the  node  n  with  local  information  £n. 


Besides  the  SOM  and  IM,  the  information  required  in  the  coordination  interval  should  also  include  the  delay 
vectors  d„  and  the  control  messages  for  RTS/CTS  coordination  [8][12].  Note  that  the  local  information  £n 
does  not  need  to  include  all  these  information  in  each  time  slot  (except  the  control  messages).  For  example,  the 
SOM  and  IM  can  be  collected  in  a  different  period,  depending  on  the  sensing  and  information  exchange 
mechanism.  Hence,  the  coordination  duration  dff )  will  vary  for  different  time  slots,  which  will  be  discussed 
in  more  detail  in  Section  V.C.  Next,  we  investigate  the  benefit  of  acquiring  information  from  different  /i-hop 
neighbor  nodes,  which  also  affects  the  duration  of  the  coordination  interval  dj  (£„ ) . 

B.  Benefit  of  acquiring  information  and  information  constraints 

For  the  network  node  n ,  the  local  information  £„  gathered  from  different  network  nodes  has  different 
impact  on  decreasing  the  objective  function  E[dn(k,aij(AIl,£n))\  in  equation  (12).  Let 
I„ (x)  =  {lk(nx,Arir ). d„  |  nx  £  N"}  denote  the  set  of  local  information  gathered  from  the  neighbor  nodes, 
which  is  x  hops  away  from  node  n  ,  where  N"  represents  a  set  of  nodes  that  is  x  hops  away  from  node  n  . 
We  define  £n (x)  =  {Tn (1)  l  =  as  the  local  information  gathered  from  all  of  these  neighbor  nodes. 

Given  the  local  information  £„ (x) ,  we  define  the  optimal  expected  delay  as  Kn(k,x)  =  E[dn (k, at] (A!£jt ,£n (a;)))] . 
The  larger  x  will  has  a  smaller  expected  delay  Kn  x) .  The  benefit  (reward)  of  the  information  I„  ( x )  for 
the  class  Ck  traffic  is  denoted  as  (k.  Tn  (x) ) .  In  a  static  network  case,  Jn(k,Tn(x))  is  defined  as: 

Jn(k,Tn(x))  4  Kn(k,x  -  1)  -  Kn(k,x),  if  z  >  1.  (13) 

We  define  Jn(k, £„(£))  =  Kn(k,l)  since  £n(l)  =  -£n(l)  .  The  reward  of  information  Jn(k, Tn(x))  can  be 
regarded  as  the  benefit  (decrease  of  the  expected  delay)  in  terms  of  the  expected  delay  E[dn(k.  a,:i)]  if  the 


information  £„  (x)  is  received  by  node  n  .  Note  that  the  optimal  expected  delay  K„  (k,  x) ,  given  the 
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information  £n  (x) : 


X 


Kn  {k,  x)  =  Kn  ( k ,  1)  -J2Jn  (*,  ^  (0)  • 
1=2 


(14) 


Equation  (14)  states  that  the  optimal  expected  delay  is  a  decreasing  function  of  x,  meaning  that  smaller 
expected  delays  can  be  achieved  as  more  information  is  gathered.  The  improvement  is  quantified  by  the  reward 
of  the  information  Jn  (k.  2~„  (1) ) .  Here,  we  ignore  the  cost  of  exchanging  such  information,  which  will  be  defined 
in  the  next  subsection.  Figure  3  shows  a  simple  illustrative  example  of  reward  of  information  at  node  n  ,  which 
is  five  hops  away  from  the  destination  node  of  class  Ck  traffic.  The  more  information  1„  (x)  available  from 


nodes  that  is  x  hops  away,  the  smaller  optimal  expected  delay  Kn(k,x)  can  be  obtained. 


200 


Kn(k,x ) 
(msec’ 


Static 

Jn(k,Tn(x)) 
(msec)  100 


Dynamic200 

J*(k,Tn(x)) 

(msec)100 

_  £n{x)  =  {-£( l )  |  l  = 

1  2  3  4  5  (x)  J„(I)  =  {Ii(„I,4),4, dnJ 

Fig.  3.  Example  of  the  static  reward  of  information  Jn{k.Tn  {x)),  dynamic  reward  of  information  ,/rf  (k.  T„  (x) )  and 
optimal  expected  delay  Kn  (, k ,  x)  (where  the  information  horizon  hn (k.  v)  =  3,  average  packet  length  Lk  =1000  bytes,  and 

average  transmission  rate  T  =  6Mbps  over  the  multi-hop  network). 

Let  Jn(k)  =  [Jn(k,Tn( x)),  for  1  <  x  <  Hn\  denote  the  reward  vector  from  1 -hop  information  to  Hn- hop 
information,  where  Hn  =  max)  //( , H„ } .  H*  represents  the  shortest  hop  counts  from  the  node  n  to  the 


destination  node  of  the  class  Ck  traffic  and  H.',  represents  the  interference  range  in  terms  of  hop  counts  for 
node  n  .  We  also  need  to  consider  the  hop  count  H ^  in  case  that  the  destination  node  is  close  to  the  node  n 
within  the  interference  range.  We  assume  that  the  reward  vector  J„(fc)  is  obtained  when  the  network  is  first 


deployed  and  only  updated  infrequently,  when  SUs  join  or  leave  the  network.  Note  that  all  the  elements  in 
J n(k)  are  nonnegative,  i.e.  Jn(k,Tn(x))  >0,  lor  1  <  x  <  Hn  ,  due  to  the  fact  that  knowing  additional 


information  cannot  increase  the  expected  delay  E[dn{k.ai:i)]  in  a  static  network.  However,  if  we  consider  the 
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propagation  delay  of  such  information  exchange  across  the  network  in  the  dynamic  network,  the  dynamic 
reward  of  information  J'iik, -Tn(x))  decreases  as  the  hop  count  x  increases.  When  the  information  of  the 
further  nodes  reaches  the  decision  node  n  ,  the  information  is  more  likely  to  be  out-of-date  (i.e.  the  information 
cannot  reflect  the  exact  network  situation  in  a  dynamic  setting,  since  the  network  conditions  and  traffic 
characteristics  are  time- varying).  Once  the  information  is  out-of-date,  jft(k, Tn(x))  =  0 ,  i.e.  there  is  no  benefit 
from  gathering  information  that  is  out-of-date.  Note  that  in  a  dynamic  network,  once  J„( k,Tn(x))  =  0  , 
Jt{k,Tn{x'))  =  0  for  x  <  x'  <  Hn  . 

Therefore,  in  the  dynamic  network,  we  define  the  information  horizon  h(k,  v)  such  that 

h„(k,is)  =  arc  max  a; 

(15) 

subject  to  Tn(x))  >  <f>(k,v),  1  <  x  <  Hn 

where  <j>(k,v)  >  0  represents  a  minimum  delay  variation  specified  by  the  application  which  determines  the 
minimum  benefit  of  receiving  local  information  for  class  Ck  traffic.  In  fact,  hn  v)  depends  on  the  variation 
speed  v  of  the  wireless  network  condition  (i.e.  the  transition  rate  of  the  Markovian  channel  condition  model, 
see  Section  III.£).  In  a  dynamic  network  with  higher  variation  speeds  v  (e.g.  with  high  mobility),  a  higher 
threshold  <f>( k,v)  is  needed  to  guarantee  that  the  information  Tn  (x)  is  still  valuable  and  it  should  be 
exchanged.  This  results  in  a  smaller  information  horizon  hn(k,v) .  We  illustrate  this  mobility  issue  in  Section 
VII.  Note  that  the  information  horizon  hn  ( k ,  v)  varies  for  different  classes  of  traffic  at  different  locations  in  the 
network.  Since  higher  priority  class  traffic  has  more  network  resources  than  the  lower  priority  class  (i.e.  they  are 
scheduled  first  for  optimization  in  equation  (12)),  the  threshold  value  <f>(k,v)  <  <t>(k\v) ,  if  k  <  k'  and  thereby, 
hn(k,v)  >  hn(k\v),  if  k  <  k' .  In  other  words,  the  information  horizon  hn(k,v)  of  a  higher  priority  class  Ck 
is  larger  than  the  information  horizon  hn  (k v)  of  a  lower  priority  class  Ck< . 

Although  the  information  horizon  h„  (k.  u)  can  vary  at  different  locations  for  different  priority  classes 
depending  on  the  applications,  the  complexity  of  such  implementation  is  high  and  the  adaptation  of  the 
information  horizon  itself  can  be  an  interesting  topic.  Hence,  we  will  leave  the  information  horizon  adaptation 
problem  to  our  future  research.  For  simplicity,  we  assume  in  this  paper  that  the  information  horizon  is  only  a 
function  of  the  network  variation  speed  v ,  i.e.  hn(k,u)  =  h{v) .  The  information  horizon  h{v)  is  determined 
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for  the  most  important  class  among  the  SUs  in  the  network.  This  definition  of  the  information  horizon  h(y)  is 
aligned  with  [14],  in  which  h(u)  is  defined  as  the  maximum  number  of  hops  that  the  information  can  be 
conveyed  in  r ,  such  that  the  network  is  considered  unchanged  (recall  that  any  network  changes  within  the 
interval  t(v)<1/v  can  be  regarded  negligible). 

Based  on  this  information  horizon  h(v) ,  we  assume  that  the  network  nodes  within  the  h(y)  hops  form  an 
information  cell.  Only  the  local  information  £n  (h)  within  the  information  cell  is  useful  to  the  node  n  ,  since 
the  reward  of  information  is  zero,  i.e.  Jn(k,Tn(x))  =  0  for  Va:  >  h(v) .  In  the  dynamic  network,  network  node 
n  determines  its  action  at  time  slot  t  based  on  the  acquired  information  at  the  previous  time  slot  t  —  1 .  The 
optimization  problem  in  equation  (12)  can  be  written  as: 

Kv\t)  =  arg  min  E[dn  (k,  atj  (A  ,£n  (h,  t  - 1)))] 

subject  to  (A,4i(M  -  1)))]  <  A:  -  dn  (erf  -  p,j  e  Ck  .  (16) 

Ai  e  A  (t  —  i) 

Recall  that  the  neighbor  nodes  of  the  node  n  are  defined  as  the  nodes  that  can  interfere  or  can  be  interfered  by 
the  node  n  (within  //(  hops),  which  may  not  align  with  the  range  of  the  information  cell  (within  h(v)  hops). 
If  all  neighbor  nodes  are  within  the  h  -hop  information  cell,  all  necessary  information  are  timely  conveyed  to 
the  node  n  .  Otherwise,  the  neighbor  nodes  that  are  too  far  away  cannot  convey  the  interference  information  to 
the  node  n  in  time.  Since  the  required  information  cannot  be  acquired  in  time,  the  solution  in  equation  (16) 
becomes  suboptimal.  We  refer  to  this  problem  as  “information  exchange  mismatch”  problem. 

Figure  4  illustrates  two  simple  network  examples  with  and  without  the  mismatch  problem.  Note  that  in  Figure 
4(b),  since  the  information  cell  does  not  cover  all  the  interfering  neighbor  nodes,  the  center  node  n2  will  still 
be  interfered  by  other  secondary  users.  In  fact,  due  to  the  nature  of  the  multi-hop  wireless  environment,  the 
network  nodes  that  are  far  away  from  the  node  n  have  limited  interference  impact  on  node  n2 .  Hence,  even 
though  the  information  horizon  h  does  not  match  the  interference  range,  the  performance  degradation  of  the 
optimization  problem  in  equation  (16)  using  the  local  information  £n(h)  is  limited. 
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-  Interference  range  of  n 2 

Information  horizon 


Fig  4.  (a)  2-hop  information  cell  network  without  information  exchange  mismatch  problem, 
(b)  1-hop  information  cell  network  with  information  exchange  mismatch  problem. 


C.  Cost  of  information  exchange 

In  the  previous  subsection,  we  discuss  the  reward  of  information  in  an  h  -hop  information  cell  while  ignoring 
the  negative  impact  of  the  information  exchange.  In  this  section,  we  discuss  the  cost  (increase  of  the  expected 
delay)  due  to  this  information  exchange.  Recall  that  the  duration  of  the  time  slot  is  tfu),  which  is  also  the 
interval  between  the  repeated  information  exchanges  in  the  network.  We  define  there  are  c  time  slots  in  r 
seconds,  i.e. 


tfv)  = 


rf) 


(17) 


c  defines  the  frequency  of  the  decision  making  as  well  as  the  learning  process,  which  will  be  discussed  in  detail 
in  Section  VI.  Note  that  decisions  can  be  made  every  and  this  time  slot  duration  is  short  enough  compared  to 
t  .  Hence,  the  network  changes  in  tj  is  also  negligible. 

Recall  that  the  coordination  duration  in  a  time  slot  for  the  network  node  n  is  d/lfjh)) .  Assume  the 


information  unit  for  the  required  information  is  U(I\  U(A),  and  U<'d'1  per  class,  respectively.  Assume  the 
average  number  of  nodes  in  an  h  -hop  information  cell  is  N(h) .  The  information  time  overhead  of  f,(h)  is  on 


average  dfxff))  =  N(h)[(K  -  1  )(UW  +  U{1))  +  U{A) } . 

Note  that  even  though  the  information  exchange  is  implemented  in  a  designated  coordination  channel  [13],  a 
network  node  with  a  single  antenna  cannot  transmit  both  the  data  and  the  control  signals  at  the  same  time.  This 
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information  exchange  time  overhead  decreases  the  effective  transmission  rate  at  node  n  using  the  line  e  and 
frequency  channel  / : 


T,:(e,f)  =  tl(v)  x  Tn (e,  f) . 

ti\v) 


(18) 


Hence,  the  effective  transmission  time  at  a  node  n  using  the  link  e  and  frequency  channel  /  to  transmit  a 
packet  in  class  Ck  becomes: 


ETT;k(e,f) 


ti{") 

ti{v)  -  d/ (/;(/0) 


x  ETTnk(e,f) . 


(19) 


In  conclusion,  the  increase  of  the  effective  transmission  time  degrades  the  performance  of  the  delay  sensitive 
applications.  The  degradation  depends  on  the  content  of  the  local  information  exchange  £„  (h) ,  and  the  network 

variation  speed  v.  Hence,  the  benefit  J'l  (k,  Tn  (x) )  in  equation  (15)  will  decrease  due  to  this  cost  of  the 
information.  Hence,  we  denote  the  value  of  information  with  this  cost  consideration  as  J'n(k,  Tn(x)) : 


Jn  (E  -A  V)  ) 


Kn  (kj  x 
Kn  ( k ,  x 


i  )-K(k,x) 

i  w _ _ 

;  tj{u)-  d^AAx-1)) 


Kn  (A:,  x )  x 


tiW) 

ti(v)  -  d/(4,(a:)) 


(20) 


And  the  optimal  information  horizon  hn{k,v)  in  equation  (15)  also  decreases  due  to  the  cost.  Next,  we 
discuss  the  proposed  distributed  resource  management  algorithm  based  on  the  information  exchanges  and 
learning  capabilities  to  tackle  the  optimization  problem  in  equation  (16). 


vi.  Distributed  Resource  Management  Algorithms 

Figure  5  provides  a  system  diagram  of  the  proposed  distributed  resource  management.  First,  a  packet  j  e  Ck 
is  selected  from  the  application  scheduler  at  the  node  n  based  on  the  impact  factor  Xk  of  the  packet  and  an 
action  An  is  taken  for  that  packet.  The  application  layer  information  including  Ck,Lk,Dk  is  conveyed  to  the 
network  layer  for  this  action  decision.  Network  conditions  Tn(e, /) , pn(e, f)  are  then  conveyed  from  the 
MAC/PHY  layer  for  computing  the  ETT  values  using  equation  (5). 

In  addition  to  the  Tn(e,  f) ,  pn(e,  f) ,  the  action  selection  is  impacted  by  the  interference  induced  from  the  action 
of  these  neighbor  nodes  and  hence,  the  information  received  from  the  neighbor  nodes  in  the  information  cell. 
Recall  that  £n  (h)  =  {Tn(l)  \  1  =  1,...,  h} .  We  use  the  notation  - n(h )  to  represent  the  set  of  the  neighbor  nodes 
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of  the  network  node  n  in  the  h  -hop  information  cell.  Hence,  the  local  information  exchanged 
£n(h)  =  {lk(-n(h),A_n^),A_n^hyd_n^}  across  the  network  nodes  is  required.  Hence,  the  node  n  knows  the 

estimated  delay  d  from  its  neighbor  nodes  to  the  destinations,  so  as  the  actions  A  nfh)  of  its  neighbor 
nodes  and  their  IM  lk  (-n(h) ,  A_n^ ) .  Based  on  the  delay  information  from  the  neighbor  nodes  ,  a 

network  node  can  update  its  own  estimated  delay  to  the  various  destinations  and  determine  the  minimum-delay 
action  based  on  Bellman-Ford  algorithm  [17]. 


►  Data  transmission 
*  Inter  node  information  exchange 
Cross-layer  message  passing 


Fig.  5.  System  diagram  of  the  proposed  distributed  resource  management. 


We  separate  the  distributed  resource  management  into  two  blocks  at  the  node  n  as  in  Figure  5  -  the 
information  exchange  interface  block  that  regularly  collects  required  local  information  and  the  route/channel 
selection  block  to  determine  the  optimal  action.  We  now  discuss  the  role  of  the  exchanged  information  and  the 
two  algorithms  implemented  in  these  blocks,  respectively. 

A.  Distributed  resource  management  algorithms 

The  next  algorithm  is  performed  at  network  node  n  at  the  information  exchange  interface  in  Figure  5. 
Algorithm  1.  Periodic  information  exchange  algorithm: 

Step  1.  Collect  the  required  information  -  the  node  n  first  collects  the  required  information  the  SOM  Z 
from  channel  sensing  and  £n  ( h )  =  {Ifc  {-n{h) ,  A_n^ ) ,  ,  d  _n(h) }  from  the  neighbor  nodes  in  the  information 

cell. 

Step  2.  Learn  the  behavior  of  the  neighbor  nodes  -  by  continuously  monitoring  the  actions  of  the  neighbor 
nodes,  node  n  can  model  the  behavior  of  the  neighbor  nodes  or  learn  a  better  transmission  strategy  using 
strategy  vectors  s(n')  =  \sA(n')  \  A  =  (e  £  En>,f  e  F„/)],  n'  e  -n{h),  where  sA{n')  represents  the  probability 
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(strategy)  of  selecting  an  action  A  by  the  node  n  ,  which  will  be  discussed  in  the  next  subsection. 

Step  3.  Estimate  the  resource  matrix  -  from  the  SOM  and  the  IM  \-k(n'  ,Alk)  gathered  from  the  neighbor  node 

n' ,  the  resource  matrix  can  be  obtained  for  each  class  of  traffic  by  =  R„  0  I/  _i  0  ...  0  Z„ ,  which  will  be 
explained  in  Section  VIM  in  more  details.  Then  the  available  resource  R^(A  „)  are  provided  to  the  network 
layer  route/channel  selection  block  stated  in  the  Algorithm  2. 

Step  4.  Update  information  {IA,  («,  A ) ,  An  d„)  -  based  on  the  recently  selected  action  An ,  the  latest  delay 
vector  dtJ ,  and  the  IM  lk  (n,  An )  .  Two  types  of  interference  model  are  considered  in  this  paper  when 
constructing  the  IM  \k{n.Au)  from  equation  (3): 

1)  A  network  node  can  transmit  and  receive  packets  at  the  same  time  -  Note  that  a  node  cannot  reuse  a 

frequency  channel  /  €  F„  used  by  its  neighbor  nodes.  If  a  frequency  channel  is  used  by  its  neighbor  nodes, 
all  the  elements  in  the  column  of  the  interference  Ifc(n,Ai)  that  is  associated  with  the  frequency  channel 
are  set  to  1 .  Then  the  IM  is  exchanged  to  the  nodes  within  the  pre-determined  information  horizon  h  . 

2)  A  network  node  cannot  transmit  and  receive  packets  at  the  same  time  -  In  this  case,  if  the  frequency  channel 

/  G  F„  is  used,  all  the  elements  in  the  column  of  the  IM  lk  (n,  An )  associated  with  the  frequency  channel 
are  set  to  1 .  In  addition,  if  a  network  link  e  €  E„  is  used  by  its  neighbor  nodes,  all  the  elements  of  the  IM 
lk(n,An)  that  is  associated  with  the  node  n  are  also  set  to  1,  no  matter  what  frequency  channel  it  uses. 
Then  the  IM  is  exchanged  to  the  nodes  within  the  pre-determined  information  horizon  h  . 

Step  5.  Broadcast  the  information  (I*,  (n,  A ) ,  A„ ,  d„ }  and  repeat  the  algorithm  periodically  in  every  t,  (v) 
seconds. 

The  next  algorithm  is  performed  at  the  network  node  n  at  the  network  layer  minimum-delay  route/channel 
selection  block  in  Figure  5. 

Algorithm  2.  Minimum-delay  route/channel  selection  algorithm: 

Step  1.  Determine  the  packet  to  transmit  -  based  on  the  impact  factor,  one  packet  j  in  the  buffer  at  the  node 
n  is  scheduled  to  be  transmitted.  Assume  the  packet  j  e  Ck,  and  the  information  of  Ck ,  Lk ,  f)k  —  d,f  are 
extracted  or  computed  from  the  application  layer. 

Step  2.  Construct  the  feasible  action  set  -  construct  the  feasible  action  set  A n(k)  from  the  resource  matrix 
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R(,2  given  from  the  information  exchange  interface  for  the  priority  class  Ck  at  the  node  n  (see  equation  (4)). 
Step  3.  Estimate  the  channel  condition  -  the  transmission  rate  Tn(e,f)  and  packet  error  rate  pn{e,f)  for 
each  link-frequency  channel  pair  (eeE„,/eF„)  are  provided  from  the  PHY/MAC  layer  through  link 
adaptation  [20]. 

Step  4.  Calculate  the  expected  delay  toward  the  destination  -  for  each  action  Av  e  A  n(k)  of  the  traffic  class 


Ck: 


E[dn(k,  A,,)]  =  ETTnk(AIl)  +  E[dn>^^{k)],  for  e  An(k),  (21) 

where  E[dnkAnj(k)}  represents  the  corresponding  element  for  the  class  Ck  in  the  delay  vector  d  from  the 
neighbor  node  n'(4 ,).  ETTnk(A„ )  can  be  calculated  based  on  Lk ,  T„(e,/),and  pn{e,f)  using  equation  (5). 
Step  5.  Check  the  delay  deadline  -  if  E[dn(k)\  >  Dk  -  dj’  -  p  ,  drop  the  packet. 

Step  6.  Select  the  minimum  delay  action  -  if  E[dn(k)}  <  Dk  —  d,[’  —  p ,  find  the  minimum-delay  route  and 
frequency  channel  selection,  i.e.  determine  the  optimal  action  A,°pt  from  the  feasible  action  set  A„  (k) .  In  other 
words,  the  goal  here  is  to  solve  equation  (16)  at  node  n  : 

AT !  =  arg  min  E[dn(k, AJ] .  (22) 

4,6A„(i) 


Note  that  the  feasible  action  set  A n(k)  in  equation  (22)  depends  on  the  actions  of  other  neighbor  nodes  A_n  .  It 
is  important  for  the  network  nodes  to  adopt  learning  approaches  for  modeling  the  behaviors  of  these  network 
nodes  to  decrease  the  complexity  of  the  dynamic  adaptation.  This  will  be  discussed  in  the  next  subsection. 

Step  7.  Send  RTS  request  -  after  determining  the  next  relay  and  frequency  channel,  send  RTS  request 
indicating  the  determined  action  information  A',’1'*  to  the  next  relay. 

Step  8.  Wait  for  CTS  response  and  transmit  the  packets. 

Step  9.  Update  the  delay  and  the  current  action  information  -  after  selecting  the  optimal  action,  update  the 
estimated  delay  E[dn(k)]  using  exponential  moving  average  with  a  smoothing  factor  a: 

E[dn  (k)]  =  a  x  E[dn  (k)]°ld  +  (1  -  a)  x  E[dn  (k,  Kpt )] ,  (23) 

and  provide  the  updated  delay  vector  d„  =  [E[dn(2)\,...,E[dn(K)}\  to  Algorithm  1  at  the  information  exchange 


interface.  In  Figure  6,  we  provide  a  block  diagram  of  the  proposed  distributed  resource  management.  For  the 
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blocks  that  beyond  the  scope  of  this  paper,  we  refer  to  [4] [5]  for  channel  sensing,  [8] [12]  for  RTS/CTS 


coordination,  and  [17]  for  the  delay  vectors. 

I  I :  blocks  that  are  not 

1 — 1  covered  in  this  paper  Periodic  information 

exchange  algorithm 


Minimum-delay 

route/channel  selection  algorithm 


fi  k  4i )  i  An  >  n  } 


Information  update 


Fig.  6.  Block  diagram  of  the  proposed  distributed  resource  management  at  network  node  n  . 


B.  Adaptive  fictitious  play  (AFP) 

We  now  provide  a  learning  approach  for  the  SUs  to  learn  the  feasible  action  set  A„(fc)  in  equation  (22)  for 
our  distributed  resource  management  algorithms.  Specifically,  based  on  the  information  exchange  £n(h),  the 
behaviors  of  the  neighbor  nodes  in  the  information  cell  can  be  learned  (Step  2  of  Algorithm  1)  and  based  on  the 
behaviors,  the  feasible  action  set  A n(k)  is  determined.  This  motivates  us  to  apply  a  well-known  learning 
approach  -  fictitious  play  [15],  applied  when  the  SUs  are  willing5  to  reveal  their  current  action  information  and 
thereby,  they  are  able  to  model  the  behaviors  (strategies)  of  other  SUs  (a  model-based  learning  [18]).  Flowever, 
due  to  the  information  constraint  discussed  in  the  previous  section,  only  the  information  from  the  neighbor 
nodes  in  the  information  cell  is  useful.  Flence,  we  adapt  the  fictitious  play  learning  approach  to  our  considered 
network  setting.  Figure  7(a)  provides  a  block  diagram  of  the  proposed  distributed  resource  management 
algorithm  using  the  adaptive  fictitious  play. 

Note  that  only  part  of  the  SUs  can  be  modeled  via  the  learning  approach  depending  on  the  information 
horizon.  Specifically,  a  node  n  maintains  a  strategy  vector  over  time 
s (n',t)  =  [sA(n',t)  |  A  =  (e  e  E e  Fn/)]  for  each  of  its  neighbor  nodes  n'  e  - n(h )  in  the  information  cell. 


5  If  the  action  information  is  not  provided  by  the  other  secondary  users,  a  node  can  learn  its  own  strategy  from  its  action  payoffs  -  the  estimated  delay 
E[dn  (k)  .  The  learning  approach  refers  to  the  reinforcement  learning  (a  model-free  learning  [18]  or  a  payoff-based  learning). 
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sA(n',t)  represents  the  frequency  selection  strategy  of  the  node  n'  making  action  A  at  time  t ,  which  is 
obtained  using: 


sA(n',t) 


rA(n',t) 

Xe(E„.,F„.) 


(24) 


where  rA(n',t )  is  the  propensity  [16]  of  node  n'  for  taking  action  A  at  time  t ,  which  can  be  computed  by: 

rA{n',t)  =  a  x  rA(n',t  -  1)  +  I(An,(t)  =  A) ,  (25) 

where  a  <  1  is  a  discount  factor  quantifying  the  importance  of  the  history  value.  I(An'{t)  =  A)  represents  an 
indicator  function  such  that, 


i(AAt)  =  A)  = 


1,  if  the  action  of  the  node  n'  at  time  t  is  A 
0,  otherwise 


(26) 


Figure  7(b)  shows  how  the  network  variation  speed  v  affects  the  size  of  the  information  cell  and  ultimately, 
the  video  perfonnance.  We  will  consider  the  mobility  of  the  network  relays  to  show  this  network  variation 


impact  in  the  next  section. 

(a) 


(b) 


Adapt  the  horizon  to  optimize  the  performance 


Fig.  7(a).  Block  diagram  of  the  proposed  distributed  resource  management  algorithm  using  the  AFP. 

7(b).  Impact  of  the  network  variation  on  the  FP  and  the  video  performance. 

As  stated  in  Section  IIIJs,  sA(n',t)  represent  the  probability  that  the  network  node  n  will  choose  an  action 
A .  Flence,  the  probability  sA(n',t )  for  modeling  the  node  n'  making  an  action  A  at  time  t  will  increase 
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with  the  actual  times  that  the  action  A  is  selected.  Based  on  the  strategy  sA(n',t) ,  the  adaptive  fictitious  play 
provides  the  estimated  IM  Ifc  ,  and  then  the  feasible  action  set  A „(k)  can  be  computed. 

From  the  gathered  IM  lk(n',A„i)  from  the  neighbor  node  n  e  —n(h) ,  the  node  n  can  compute  the 
expected  IM  from 


I  !=[/§]=  £  I  *(«')  =  ^2  J2sA(n'Mn',A). 

n'e-n(k)  n'e— n(k)  A 

Then,  the  node  n  can  estimate  the  IM  I,,  for  the  traffic  in  class  Ck  : 


I/.  I  |  I'ij 


1,  if  Ifj  >  n 
0,  if  /?•  <  n  ^ 


(27) 


(28) 


where  n  represents  a  threshold  value  that  determines  whether  or  not  a  link- frequency-channel  pair  (e,  /)  is 
considered  to  be  occupied.  Feasible  action  set  A n(k)  can  hence  be  learned  based  on  the  resource  matrix 

r[2  =  R fl  0  F  i  0  ...  0i  Zra  using  equation  (4).  By  learning  the  feasible  action  set  A„  (/,;) ,  the  best  response 
actions  are  computed  using  equation  (22). 


C.  Information  exchange  overhead  reduction 

The  fictitious  play  suffers  from  a  large  information  overhead,  since  it  requires  all  the  local  information 
£n{h)  =  {I/t (— n(/i) ,  in  the  /i-hop  information  cell.  From  the  cost  of  information  exchange 

in  equation  (20),  we  know  that  the  overhead  can  increase  the  expected  delay,  especially  when  the  network 
changes  slowly  (i.e.  with  a  large  information  cell).  Hence,  the  overhead  reduction  is  required  to  mitigate  the 
performance  degradation. 

(1)  Reducing  the  information  horizon. 

Recall  that  the  information  overhead  of  Z^(h)  is  N(h)[(K  —  1)(U^  +  U^)  +  U^]  in  average  (N{h)  is  the 
average  number  of  nodes  in  an  h- hop  information  cell).  With  an  information  horizon  h'  <  h ,  the  overhead 
becomes  N(h ') [{K  —  1)(U^  +  U^)  +  U^],  where  N(h')<N(h).  Note  that  it  is  not  always  beneficial  to 
decrease  the  overhead  by  reducing  the  information  horizon.  There  exists  a  trade-off  as  discussed  in  Section  V. 
The  reward  of  information  Jn(k,Tn(x)) ,  x  <  h  in  equation  (15)  provides  a  metric  to  select  the  most  valuable 


information  from  the  nodes  within  the  information  cell. 
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(2)  Reducing  the  number  of  classes. 

From  equation  (12),  we  know  that  the  higher  priority  classes  will  not  be  influenced  by  the  lower  priority 
classes.  Flence,  the  information  overhead  can  be  reduced  by  ignoring  the  information  exchange  of  the  lower 
priority  classes.  The  overhead  becomes  N(h)[(k'—  1)(UW  +  +  U (j4)] ,  k'  <  K . 

(3)  Reducing  the  frequency  of  learning. 

Although  we  divide  c  time  slots  in  r  seconds,  a  network  node  n  does  not  have  to  learn  in  all  these  c 
time  slots.  In  other  words,  the  periodic  learning  process  of  the  node  n  does  not  have  to  be  aligned  with  the 
information  exchange  (decision  making).  In  order  to  avoid  simultaneous  learning  among  network  neighbors  in  a 
distributed  manner,  at  each  time  slot,  the  network  node  n  updates  the  strategy  vector  sA(n',t)  with 
probability  en  =  bn  /  c  (bn  <  c),  and  keeps  the  same  strategy  vector  with  probability  1  —  en .  In  other  words, 
the  network  node  n  chooses  bn  time  slots  out  of  c  time  slots  in  r  seconds  to  model  the  behavior  of  other 
neighbor  nodes.  Note  that  the  parameter  bn  characterize  the  speed  of  learning  at  different  network  node  n  . 
The  larger  bn  gives  the  network  node  n  faster  learning  capability.  The  information  overhead  of  £n  (h) 

becomes  bn/cx  N(h)[(K  -  1  )(UW  +  U[I])  +  U(A)}. 

vii.  Simulation  Results 

We  simulate  two  video  streaming  applications  that  are  transmitting  videos  V1  “Coastguard”  and  V2 
“Mobile”  (16  frames  per  GOP,  frame  rate  of  30Hz,  CIF  format)  over  the  same  multi-hop  cognitive  radio 
network.  Each  video  sequence  is  divided  into  four  priority  classes  ( K{  =  4,  K  =  9 )  with  average  packet  length 
Lk=  1000  bytes  and  delay  deadline  Dk=  500  millisecond.  Although  the  first  priority  class  C\  is  reserved  for 
the  primary  users,  let  us  first  consider  the  case  when  there  are  no  primary  users,  i.e.  only  the  SUs  and  NRs  are 
transmitting.  We  assume  that  there  are  two  frequency  channels  ( M  =2).  The  wireless  network  topology  is  shown 
in  Figure  8  in  a  100x100  meters  region  with  N=  15  nodes  and  L  =  22  links  similar  to  the  network  settings  in 
[21].  A  link  is  established  as  long  as  the  channel  condition  (described  in  the  paper  by  the  link  SINR)  is 
acceptable  within  the  transmission  distance  (approximately  36  meters).  Note  that  this  transmission  distance  is 
not  aligned  with  the  interference  range  H„ .  Neighbor  nodes  that  are  beyond  the  transmission  distance  can  still 


interfere  with  each  other. 
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Fig.  8.  Wireless  network  settings  for  the  simulation  of  two  video  streams. 

A.  The  reward  and  cost  of  the  information  exchange 

First,  we  simulate  the  impact  of  the  information  including  the  reward  ./',  (see  equation  (13))  and  cost 
(see  equation  (20))  from  the  expected  delay  E[dn  ]  using  the  adaptive  fictitious  play  in  Section  VII  with 
different  information  horizons.  Figure  9  shows  the  resulting  reward  and  cost  of  information  at  different  locations 
for  streaming  video  V,  (at  noden=  1,  7,  and  13  on  one  of  the  routes  of  video  Vi).  The  results  show  that  a 
1-hop  information  cell  is  enough  when  the  interference  range  is  40  meters,  since  only  the  nodes  that  are  1  hop 
away  can  interfere  with  each  other.  If  the  interference  range  is  80  meters,  the  information  exchange  mismatch 
problem  (see  Section  V)  occurs  and  the  appropriate  information  horizon  for  information  exchange  is  then 
increased  to  2. 
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Fig. 9.  Reward  and  cost  J'n  of  different  information  horizon  at  different  node  for  video  V\  . 
B.  Application  layer  performance  with  different  information  horizons  and  interference  ranges 


We  next  compare  the  proposed  dynamic  resource  management  algorithm  using  adaptive  fictitious  play  (AFP) 
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with  two  other  resource  management  methods  -  AODV  [23]  with  load  balancing  over  the  two  available 
frequency  channels  (AODV/LB)  and  the  Dynamic  Least  Interference  Channel  Selection  [24]  (DCS)  extended  to 
a  network  setting.  Table  I  and  II  show  the  results  of  the  Y-PSNR  of  the  two  video  sequences  using  different 
approaches.  The  results  show  that  the  proposed  algorithm  using  learning  from  the  nodes  within  the  information 
cell  outperforms  the  alternative  approaches.  Especially,  when  the  interference  range  is  large  (H*=  80  meters), 
the  proposed  AFP  approach  significantly  improves  the  video  quality  (X  represents  PSNR  below  26  dB,  which  is 
unacceptable  for  a  viewer). 

For  delay  sensitive  applications,  we  measure  the  packet  loss  rate  (i.e.  the  probability  that  the  end-to-end  delay 
exceeds  the  delay  deadline)  for  different  approaches  in  Figure  10(a).  The  results  of  both  applications  are  shown. 
The  AODV  represents  the  on-demand  routing  solution  with  only  1  frequency  channel.  The  AODV/LB  approach 
randomly  distributes  packets  over  the  two  available  frequency  channels.  The  DCS  approach  with  cognitive 
ability  selects  a  better  frequency  channel  based  on  the  link  measurements  and  hence,  improves  the  performance 
opposed  to  the  AODV/LB.  The  AFP  further  improves  the  performance  of  both  applications  by  learning  the 
behaviors  of  the  neighbor  nodes.  Interestingly,  the  benefit  brought  by  the  learning  capability  decreases  as  the 
network  bandwidth  increases.  In  other  words,  it  is  not  worthy  to  be  too  intelligent  in  an  environment  with  plenty 
of  resource.  Moreover,  as  shown  in  Figure  10(b),  the  improvement  of  2-hop  information  cell  is  limited  when  the 
interference  range  is  40  meters.  This  is  because  the  nodes  that  are  two  hops  away  have  no  impact  on  the  current 
node  and  their  information  is  not  valuable  (i.e.  it  does  not  impact  the  utility). 


Table  I. 

The  characteristic  parameters  of  the  video  classes  of  the  two  video  sequences. 


Video  Classes 

Video  2  “Coastguard” 

1500  Kbps 

Video  1  “Mobile” 

1668  Kbps 

fk 

h 

k 

k 

h 

h 

h 

k 

h 

Xk  (dB/Kbps) 

0.0105 

0.0064 

0.0048 

0.0042 

0.0170 

0.0064 

0.0042 

0.0031 

Table  II. 

Y-PSNR  OF  THE  TWO  VIDEO  SEQUENCES  USING  VARIOUS  APPROACHES  (  //.,(  =  40  METERS) 


Y-PSNR  (dB) 

Network  Bandwidth 

AODV/LB 

DCS 

AFP  (1-hop 
infonnation  cell) 

Average 

ft 

32.47 

35.21 

35.61 

T  =5.5  Mbps 

v2 

31.70 

33.32 

33.32 
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Table  II. 


Y-PSNR  OF  THE  TWO  VIDEO  SEQUENCES  USING  VARIOUS  APPROACHES  (H^=  80  METERS) 


Y-PSNR  (dB) 

Network  Bandwidth 

AODV/LB 

DCS 

AFP  (1-hop 
information  cell) 

AFP  (2-hop 
information  cell) 

Average 

Vi 

X 

X 

28.19 

29.80 

T  =5.5  Mbps 

V2 

X 

X 

31.26 

31.70 

Average 

Vi 

30.47 

34.46 

35.61 

35.61 

T  =10  Mbps 

F2 

31.92 

33.08 

33.32 

33.32 
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Fig.  10.  (a)  Packet  loss  rate  vs.  average  transmission  bandwidth  using  different  approaches  (H^=  80  meters), 
(b)  Packet  loss  rate  vs.  average  transmission  bandwidth  using  different  approaches  (  H  jn  =  40  meters). 

C.  Reducing  the  frequency  of  learning 


When  the  interference  range  is  40  meters,  Figure  10(b)  shows  that  the  AFP  with  1-hop  information  cell  is 
better  than  with  2-hop  information  cell,  since  1  -hop  information  cell  has  smaller  cost  of  information  exchange. 
In  addition  to  reducing  the  information  horizon,  reducing  the  frequency  of  learning  hn  /  c  at  all  the  nodes  can 
also  reduce  the  cost  of  information  exchange.  Figure  1 1  shows  the  packet  loss  rate  of  the  two  applications  with 
different  information  horizon  when  bn  /  c  changing  from  1  to  0.5.  As  the  learning  frequency  bn  /  c  decreases, 
the  packet  loss  rate  decreases  with  the  cost  of  information  exchange.  However,  it  is  shown  that  when  K /c< 
0.6,  the  AFP  becomes  inefficient  and  the  packet  loss  rate  starts  increasing  for  both  applications.  In  other  words, 
changing  the  frequency  of  learning  will  also  lead  to  a  trade-off  between  the  learning  efficiency  and  the 
information  overhead.  The  information  overhead  decreases  when  the  learning  frequency  bn  /  c  decreases  and 
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hence,  the  packet  loss  rate  decreases.  However,  when  the  learning  frequency  is  too  slow  ( bn/c<  0.6),  the 
learning  efficiency  decreases  and  this  results  in  an  increasing  packet  loss  rate. 
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Fig.  11.  Packet  loss  rate  vs.  learning  frequency  bn  / c  (average  T  =5.5  Mbps,  //)  =  80  meters). 

D.  Impact  of  the  primary >  users 

The  simulation  implies  that  the  reward  of  information  is  also  impacted  by  the  existence  of  the  primary  users. 
Next,  we  consider  the  impact  of  the  primary  users,  which  always  have  higher  priority  to  access  the  pre-assigned 
frequency  channels  than  the  network  nodes  in  Figure  8.  Assume  that  the  frequency  channel  F1  is  occupied  by 
the  primary  users  with  time  fraction  p=  0%,  20%,  40%,  60%,  and  80%  around  a  certain  congestion  region 
(network  nodes  n  =  7,  11,  12)  in  Figure  8.  Figure  12  shows  the  packet  loss  rate  for  the  two  video  streams  using 
the  AFP  with  various  information  horizons.  The  average  transmission  rate  is  set  to  5.5  Mbps,  bn  / c  =  1,  and  the 
interference  rage  is  80  meters. 

The  results  show  that  as  the  time  fraction  p  increases,  the  packet  loss  rates  of  both  applications  increase, 
since  fewer  resources  are  available  for  the  secondary  users  to  transmit  the  packets.  As  the  simulation  in  the 
previous  subsection,  when  the  interference  rage  is  80  meters,  AFP  with  2-hop  information  cell  still  performs 
better  than  1-hop  information  cell  case.  Interestingly,  for  application  V\ ,  AFP  with  3-hop  information  cell 
performs  even  better  in  a  large  p  case,  even  though  more  cost  of  information  is  needed.  This  is  because  the 
congestion  region  are  more  likely  to  be  discovered  at  the  source  node  n  =1  and  detour  the  packets  through  other 
routes.  However,  such  advantage  is  not  exploited  by  the  application  V2 ,  since  its  destination  node  is  affected  by 
the  primary  users  and  there  is  no  way  to  detour  the  packets.  Note  that  when  there  is  no  primary  user  ( p  =  0), 
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AFP  with  3-hop  information  cell  performs  worse  than  2-hop  case  due  to  the  larger  cost  of  information  exchange. 
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Fig.  12.  Packet  loss  rate  vs.  time  fraction  p  of  the  primary  users  occupying  frequency  channel  Ft 
around  network  node  n  =  7, 1 1,  12  (average  T  =5. 5Mbps,  bn/c  =  1,  H„  =  80  meters). 


E.  Impact  of  mobility 

In  this  subsection,  we  consider  the  impact  of  mobility  on  the  video  performance.  We  adopt  a  well-known 
mobility  model,  the  “random  walk”  [26],  in  which  the  relay  nodes  (secondary  users)  shown  in  Figure  8 
randomly  select  a  direction  at  each  time  slot  and  move  at  a  fixed  speed  v .  We  simulate  the  speed  v  ranging 
from  0  to  1  meters/sec.  We  assume  that  there  is  no  primary  user,  i.e.  p  =  0 .  The  average  transmission  rate  is  set 
to  8  Mbps,  bn/c=  1,  and  the  interference  rage  is  80  meters.  Figure  13  illustrates  the  packet  loss  rate  as  the 
mobility  changes  for  different  information  horizons.  The  results  show  that  the  mobility  degrades  the 
performance  of  both  applications.  When  the  mobility  v  is  small,  AFP  with  information  horizon  h  =  2 
performs  better  than  with  information  horizon  h  =  1  as  in  the  previous  simulations  with  /f(  =  80  meters. 
However,  for  video  V2 ,  when  the  mobility  exceeds  0.6  meters/sec,  the  best  information  horizon  changes  from 
h  =  2  to  h  =  1 .  This  is  because  the  increased  mobility  will  decrease  the  information  accuracy  and  hence,  the 
required  information  horizon  also  decreases.  Note  that  for  video  V, ,  the  AFP  with  information  horizon  h  =  2 
still  performs  better  than  with  information  horizon  h  =  1 .  This  is  because  the  video  V]  has  a  longer  route  and 


thus,  modeling  more  interfering  neighbor  nodes,  using  a  larger  information  horizon,  is  still  beneficial. 
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Fig.  13.  Packet  loss  rate  vs.  mobility  v  of  the  secondary  users  (network  relays) 

(average  T  =  8Mbps,  p  =  0,  bn  / c=  1,  =  80  meters). 

viii.  Conclusions 

In  this  paper,  we  show  that  the  distributed  resource  management  solution  using  adaptive  fictitious  play 
significantly  improves  the  performance  of  delay  sensitive  applications  transmitted  over  a  multi-hop  cognitive 
radio  network.  We  assume  that  the  autonomous  secondary  users  are  able  to  learn  the  spectrum  opportunities 
based  on  the  information  exchange.  The  proposed  approach  can  also  be  used  to  support  QoS  for  general 
multi-radio  wireless  networks,  when  there  is  no  primary  user.  This  situation  is  also  brought  up  in  [4],  when  the 
secondary  users  are  competing  in  the  unlicensed  band  (i.e.  ISM  band),  where  there  is  no  primary  user. 
Importantly,  based  on  the  value  of  the  obtained  information  (i.e.  the  impact  on  decreasing  the  expected 
end-to-end  delay),  we  define  the  information  horizon  in  our  adaptive  fictitious  play.  In  addition  to  the  reward, 
the  cost  of  the  information  exchange  is  also  considered  in  terms  of  transmission  time  overheads.  Various 
approaches  of  decreasing  this  time  overhead  are  discussed,  and  their  performance  impact  is  quantified. 

The  information  horizon  is  assumed  to  be  fixed  in  this  paper  for  different  priority  classes  over  the  whole 
wireless  networks.  However,  our  simulation  results  show  that  the  benefit  from  various  information  horizons  can 
be  different  for  distinct  applications  with  various  delays  and  quality  impacts,  especially  when  primary  users  are 
present  in  the  network  at  different  locations.  Exploring  what  are  optimal  information  horizons  if  the  applications 
and  network  conditions  are  changing  forms  an  interesting  future  research  topic  in  the  multi-hop  cognitive  radio 


networks. 
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